Learning for transliteration of arabic-numeral expressions using decision tree for Korean TTS
نویسندگان
چکیده
Despite of much work on TTS technologies and several TTS systems customized for Korean, current TTS systems output many errors in transliterating non-alphabetic symbols such as Arabic numerals and text symbols. This paper proposes TLAN (Transliteration Learner for Arabic-Numeral expressions) which can efficiently disambiguate the reading and meaning of Arabic Numeral Expressions (ANEs) in texts by using a decision tree. For the purpose of analyzing and learning data, three phases of learning elements were suggested: patterns of Arabic numerals combined with text symbols, contextual features and heuristic information were classified according to the senses and sounds of ANEs. Our corpus was made up of news articles issued from January 1, 2000 to December 31, 2001 from 10 major newspapers in Korea. By learning the three phases of learning elements, the model shows 97.38% and 97.28% accuracies for the training set and the test set, respectively.
منابع مشابه
Automatic Transliteration and Back-transliteration by Decision Tree Learning
Automatic transliteration and back-transliteration across languages with drastically different alphabets and phonemes inventories such as English/Korean, English/Japanese, English/Arabic, English/Chinese, etc, have practical importance in machine translation, crosslingual information retrieval, and automatic bilingual dictionary compilation, etc. In this paper, a bi-directional and to some exte...
متن کاملDisambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS
Transliteration of Arabic numerals is not easily resolved. Arabic numerals occur frequently in scientific and informative texts and deliver significant meanings. Since readings of Arabic numerals depend largely on their context, generating accurate pronunciation of Arabic numerals is one of the critical criteria in evaluating TTS systems. In this paper, (1) contextual, pattern, and arithmetic f...
متن کاملTree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems
This study describes the tree-based modeling of prosodic phrasing, pause duration between phrases and segmental duration for Korean TTS systems. We collected 400 sentences from various genres and built a corresponding speech corpus uttered by a professional female announcer. The phonemic and prosodic boundaries were manually marked on the recorded speech, and morphological analysis, grapheme-to...
متن کاملPhonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system
Recently, corpus-based text-to-speech (CB-TTS) has been actively studied through the world. Statistical training methods are generally applied for prosodic rules in CB-TTS, and classification and regression tree (CART) is one of the mostly used methods. In this paper, we present an efficient CART training approach of zscore based phonetic normalization. The idea of ours comes from the fact that...
متن کاملروشی جدید جهت استخراج موجودیتهای اسمی در عربی کلاسیک
In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...
متن کامل